7 research outputs found

    Neural machine translation of literary texts from English to Slovene

    Get PDF
    Neural Machine Translation has shown promising performance in literary texts. Since literary machine translation has not yet been researched for the English-toSlovene translation direction, this paper aims to fulfill this gap by presenting a comparison among bespoke NMT models, tailored to novels, and Google Neural Machine Translation. The translation models were evaluated by the BLEU and METEOR metrics, assessment of fluency and adequacy, and measurement of the postediting effort. The findings show that all evaluated approaches resulted in an increase in translation productivity. The translation model tailored to a specific author outperformed the model trained on a more diverse literary corpus, based on all metrics except the scores for fluency. However, the translation model by Google still outperforms all bespoke models. The evaluation reveals a very low inter-rater agreement on fluency and adequacy, based on the kappa coefficient values, and significant discrepancies between posteditors. This suggests that these methods might not be reliable, which should be addressed in future studies.This publication has emanated from research supported in part by a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289 (Insight), co-funded by the European Regional Development Fund.peer-reviewe

    Avtomatsko pridobivanje besednih zvez iz korpusa z uporabo leksikona SSJ

    Full text link
    Računalniška leksikografija je meddisciplinarno področje, ki se osredotoča na avtomatizacijo leksikografskih postopkov in pripravo leksikalnih podatkovnih zbirk različnih vrst. V prispevku predstavljava postopek avtomatskega pridobivanja besednih zvez samostalnika z ujemalnim pridevniškim prilastkom iz besedilnega korpusa in avtomatsko pripravo izluščenih podatkov v ustrezni besednozvezni obliki z uporabo leksikona besednih oblik SSJ.The field of computational lexicography is an interdisciplinary field, primarily focusing on the automatisation of lexicographic procedures and the building of lexical databases of various kinds. In this paper we describe the automatic extraction of word phrases from a text corpus (phrases that contain adjectives that agree in gender, case, and number with the following noun) andthe transformation of extracted lexical data to a syntactically suitable final form by the means of the SSJ morphological lexicon

    Post-edited and error annotated machine translation corpus PErr 1.0

    No full text
    The PE²rr corpus contains source language texts from different domains along with their automatically generated translations into several morphologically rich languages, their post-edited versions, and error annotations of the performed post-edit operations. The main advantage of the corpus is the fusion of post-editing and error classification tasks, which have usually been seen as two independent tasks, although naturally they are not

    Back-translation approach for code-switching machine translation: A case study

    Get PDF
    Recently, machine translation has demonstrated significant progress in terms of translation quality. However, most of the research has focused on translating with pure monolingual texts in the source and the target side of the parallel corpora, when in fact code-switching is very common in communication nowadays. Despite the importance of handling code-switching in the translation task, existing machine translation systems fail to accommodate the code-switching content. In this paper, we examine the phenomenon of code-switching in machine translation for low-resource languages. Through different approaches, we evaluate the performance of our systems and make some observations about the role of code-mixing in the available corpora.This publication has emanated from research supported in part by a research grant from Science Foundation Ireland (SFI) under grant agreement number SFI/12/RC/2289_P2, co-funded by the European Regional Development Fund, and the Enterprise Ireland (EI) Innovation Partnership Programme under grant number IP20180729, NURS – Neural Machine Translation for Under-Resourced Scenarios.non-peer-reviewe

    Neural machine translation of literary texts from English to Slovene

    No full text
    Neural Machine Translation has shown promising performance in literary texts. Since literary machine translation has not yet been researched for the English-toSlovene translation direction, this paper aims to fulfill this gap by presenting a comparison among bespoke NMT models, tailored to novels, and Google Neural Machine Translation. The translation models were evaluated by the BLEU and METEOR metrics, assessment of fluency and adequacy, and measurement of the postediting effort. The findings show that all evaluated approaches resulted in an increase in translation productivity. The translation model tailored to a specific author outperformed the model trained on a more diverse literary corpus, based on all metrics except the scores for fluency. However, the translation model by Google still outperforms all bespoke models. The evaluation reveals a very low inter-rater agreement on fluency and adequacy, based on the kappa coefficient values, and significant discrepancies between posteditors. This suggests that these methods might not be reliable, which should be addressed in future studies.This publication has emanated from research supported in part by a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289 (Insight), co-funded by the European Regional Development Fund

    Back-translation approach for code-switching machine translation: A case study

    No full text
    Recently, machine translation has demonstrated significant progress in terms of translation quality. However, most of the research has focused on translating with pure monolingual texts in the source and the target side of the parallel corpora, when in fact code-switching is very common in communication nowadays. Despite the importance of handling code-switching in the translation task, existing machine translation systems fail to accommodate the code-switching content. In this paper, we examine the phenomenon of code-switching in machine translation for low-resource languages. Through different approaches, we evaluate the performance of our systems and make some observations about the role of code-mixing in the available corpora.This publication has emanated from research supported in part by a research grant from Science Foundation Ireland (SFI) under grant agreement number SFI/12/RC/2289_P2, co-funded by the European Regional Development Fund, and the Enterprise Ireland (EI) Innovation Partnership Programme under grant number IP20180729, NURS – Neural Machine Translation for Under-Resourced Scenarios
    corecore